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ABSTRACT 

Interchange  arguments  are  applied  to  establish  the  optimality  of  priority  list 
policies  in  three  problems.  First,  we  prove  that  in  a  multi-class  tandem  of  two 
•/M/l  queues  it  is  always  optimal  in  the  second  node  to  serve  according  to  the 
” e  p,"  rule.  The  result  holds  more  generally  if  the  first  node  is  replaced  by  a 
multi-class  network  consisting  of  -/M/l  queues  with  Bernoulli  routing.  Next,  for 
scheduling  a  single  server  in  a  multi-class  node  with  feedback,  a  simplified  proof 
of  Klimov’s  result  is  given.  From  it  follows  the  optimality  of  the  index  rule 
among  idling  policies  for  general  service  time  distributions,  and  among  pre¬ 
emptive  policies  when  the  service  time  distributions  are  exponential.  Lastly,  we 
consider  the  problem  of  minimizing  the  blocking  in  a  communication  link  with 
lossy  channels  and  exponential  holding  times. 
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1.  Introduction 

This  paper  has  two  main  aims.  The  first  is  to  demonstrate  the  use  of  Interchange  arguments 
in  proving  optimality  properties  and  the  second  is  to  obtain  new  results  in  stochastic  scheduling. 
The  main  idea  of  our  arguments  appears  in  Varaiya  et  al  [14]  where  it  is  used  in  order  to  obtain 
the  optimality  of  index  rules  in  multi-armed  bandit  problems.  There,  the  objective  is  to  maxim¬ 
ize  the  expected  total  discounted  reward.  We  use  variations  of  this  idea  together  with  path-wise 
coupling  techniques. 

We  first  apply  an  interchange  argument  in  Section  2  to  partially  characterize  the  optimal 
policy  for  scheduling  two  servers  in  a  tandem  of  two  nodes  with  M  different  classes  of  customers 
with  exponential  service  times.  The  result,  motivated  by  Ross  and  Yao  [ll],  is  that  the  optimal 
policy  in  the  second  node  is  a  "  c  /i”  rule.  This  is  an  easy  extension  of  the  results  of  Baras  et  al  [2] 
and  Buyukkoc  et  al  [3].  The  result  can  be  extended  to  the  case  where  the  first  node  is  a  network 
consisting  of  •/ M  /I  queues  with  Bernoulli  routing. 

Next,  in  Section  3  the  problem  of  Klimov  [7]  is  considered.  A  single  server  is  to  be 
scheduled  in  a  network  of  M /GI /I  nodes.  The  objective  is  to  minimize  the  expected  long  term 
average  cost.  It  has  been  shown  in  [14]  that  this  problem  is  equivalent  to  a  multi-armed  bandit 
problem.  Our  argument  provides  a  simple  proof  of  the  result  in  [7],  that  the  nonpre-emptive 
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nonidling  optimal  policy  is  a  priority  rule.  We  also  establish  the  optimality  of  that  rule  among 
idling  policies.  The  priorities  are  determined  and  for  the  case  where  the  service  distributions  are 
exponential  we  show  that  the  same  priority  rule  is  optimal  among  pre-emptive  policies.  Remark¬ 
ably,  the  optimal  policy  does  not  depend  on  the  arrival  rate.  Our  proof  provides  some  insight 
into  this  fact. 

Finally,  in  Section  4  we  consider  a  problem  of  stochastic  scheduling  that  does  not  fall  in  the 
framework  of  multi-armed  bandit  problems.  Calls  arrive  at  a  communication  link  where  N  chan¬ 
nels  are  available.  There  are  probabilities  of  immediate  loss  associated  with  each  channel  and  a 
successful  call  occupies  the  channel  for  an  exponential  amount  of  time.  If  the  holding  times  are 
all  independent  and  identically  distributed,  Anantharam  et  al  [l]  show  that  the  time  to  reach  the 
state  where  all  channels  are  full  is  independent  of  the  placement  policy  used.  We  provide  a  sim¬ 
ple  proof  of  this  result  and  further  prove  that  in  the  case  where  the  holding  times  are  not  identi¬ 
cally  distributed,  the  time  to  reach  the  full  state  is  stochastically  maximized  by  assigning  calls  to 
the  free  channel  with  the  shortest  holding  time. 


2.  Server  scheduling  in  a  multi-class  network 

Consider  two  -/M / 1  queues  in  tandem  with  M  classes  of  jobs.  Calls  arrive  at  the  first  node 
at  deterministic  time  instants  {a*  }*  .  The  service  rates  at  the  first  node  are  {/*,•  and  there  are 
associated  holding  costs  denoted  by  In  the  second  node  the  corresponding  service  rates 

are  ^ ;  and  the  holding  costs  are  {rf,- } .  Let  x(  =(xt‘ x(M)  (respectively  y,  ={yt  ‘,...,ytw)) 

be  the  vector  of  class  populations  in  node  1  (respectively  node  2).  Assume  that 
d  2v2>  ■  ■  ■  .  A  pre-emptive  server  allocation  policy  is  a  function 

:  (Xt  ,y( )  —  (7T,l(x,  ,yt  ).?rt2(x(  ,yt ))  6  {l,2 . M  }2 


The  objective  is  to  minimize  over  tt  the  expected  discounted  cost  incurred  in  the  interval 
[o,T]  given  by 

T 

/(TT,  t  )=£  1/ e  -*  ( s  */+  s  d<  y»dt  1  (2.1) 

0  1=1  1=1 


The  following  result  shows  that  a  ”  d  u”  policy  is  optimal  for  the  second  node.  It  is  an 
extension  of  results  in  Baras  et  al  [2]  and  Buyukkoc  et  al  [3]  who  consider  a  single  node.  It  also 
provides  an  extension  to  results  in  Ross  and  Yao  [ll]  who  consider  multi-server  scheduling  in  a 
network. 


Theorem  2.1s  In  node  2  the  optimal  policy  always  serves  job  i ,  among  the  ones  present  in 
the  queue,  for  which  the  quantity  </,- 1/,-  is  maximum. 

Proof:  The  virtual  service  process  of  an  exponential  server  with  rate  p  is  a  Poisson  point 
process  with  parameter  p.  A  point  of  this  process  is  a  service  completion  if  the  queue  is  non¬ 
empty.  Let  {f^}  (respectively  {«„'})  be  the  points  of  the  virtual  service  process  for  class 
»€{l,...,M}  in  node  1  (respectively  in  node  2).  We  only  need  consider  policies  n  switching  at 
times 

{O  =  {«.}U0i}|J(4 

.1=1  1=1 

For  T  >0,  condition  on  the  number  of  points  of  the  process  {<„  }  in  the  interval  [0,!Tj,  say 
0<<,<«2<  '  ’  "  <  h  <  T  <  tk  +j<  •  •  •  .  Optimality  will  be  proved  by  induction  on  k  .  The 
result  is  trivially  true  for  k  =0  and  assume  that  it  holds  for  k  =1,  •  •  •  ,  K  .  We  will  prove  that 
the  result  remains  true  for  k  =K  +1. 

By  the  optimality  principle  and  the  induction  hypothesis  7r2( - , • )  has  to  follow  the  ”  d  v”  rule 
at  times  tut2,...tK  and  suppose  that  7r02=t  while  y’0  >0  with  »  >j  .  Then  policy  it  cannot  be 
optimal  because  it  can  be  improved  as  follows.  Denote  by  ta  the  first  time  when  j  and 

define  policy  W  as  follows. 
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z'  = jt(\  t  >0, 

*o *=J  . 

jf(S=n-(2.  *  =tu..., 


=  2 _ _2  t _ 

7T{  — —  7Tj  ,  I  — 


^  <T+l>^  o+2> 


if 


Then  simple  algebraic  manipulation  shows  that 

J  (z,K  -fl)-J  (z  ,K  +  1)>0, 


P(j  >tk)dj  -  p(».4)<4  >0,  k=l,0. 


where 


P  (/  -4  )=Pr  {4  €{s„' }  },  l  —i  ,j  k  =l,cr. 

It  is  easy  to  verify  that  because  {an  }  is  a  deterministic  process  and  processes  {<„’},  {s,J}  are  Pois¬ 
son, 


P  (/ .4  ) 
P  (« .4  ) 


IS; 

- ,k  =1,0. 

", 


Therefore,  J ( z,K  +1  )-J  (z,K  +1)^0  since  dj  is ></,•  ist-  and  z  cannot  be  optimal.  A  similar  argu¬ 
ment  shows  that  a  policy  that  idles  in  node  2  at  time  0  cannot  be  optimal.  Note  that  policies  z 
and  z  are  not  feasible  because  they  are  allowed  to  switch  at  all  the  points  of  process  {<„  },  some 
of  which  are  not  observable.  The  above  argument  shows  that  the  "  d  is”  rule  is  optimal  among  all 
such  policies.  Yet,  the  ”  d  is"  rule  is  a  feasible  policy  and  is  optimal  among  feasible  policies  as 
well. 


□ 


Remarks  2.1 

(a)  In  the  above  proof,  policies  z  and  z  result  in  identical  arrivals  for  node  2.  The  proof, 
except  for  the  embedding,  is  identical  to  the  one  in  Buyukkoc  et  al  [7], 

(b)  The  result  stated  in  Theorem  2.1  remains  true  if  node  l  is  replaced  by  a  network  of 
■/M / 1  queues  with  Bernoulli  routing  (see  Figure  2.1),  and  the  cost  function  in  (2.1)  is  modified  in 
the  obvious  way. 

3.  Klimov’s  problem 

3.1.  The  problem 

The  following  situation  was  considered  in  Klimov  [7],  There  are  N  queues.  The  service 
times  are  independent  and  have  the  distribution  function  G,  (t)  in  queue  t  (l  <  »  <  N).  Cus¬ 
tomers  arrive  as  an  independent  Poisson  process  with  rate  X  and  are  assigned  to  queue  t  with  pro¬ 
bability  Pi .  Write  p  =  (p  „  .  ;  .  ,  ps  )•  Upon  service  completion  in  queue  t ,  a  customer  is  sent 

N 

to  queue  j  with  probability  p(j- ,  and  leaves  the  network  with  probability  p,  0  ==  1-  Yj  P<i  > 

y-i 

independently  of  the  state  of  the  network.  There  is  a  single  server  that  is  allocated  to  one  of  the 
nodes  at  a  time,  in  a  nonpre-emptive  way. 

Assumptions 

1.  The  matrix  P  =  [p,y  ,  1  <  i  ,j  <N]  is  such  that  every  customer  eventually  leaves,  i.e., 
Pn  — ►  0  as  n  — »oo.  In  particular,  this  implies  that  (7  -  P )  is  invertible. 


CO 


•  4 


2.  It  is  also  assumed  that  J  t  dG ,•  (f )  =:  /?,-  <  oo,  for  1  <  t  <  TV . 

o 

3.  Finally,  one  assumes  that  Xp[7  -  R  ]~l0  <  1,  where  0  =  (£„  .  .  .  ,  0N)T  ((. )r  denotes 
transposition). 

Denote  by  Z,’  the  number  of  customers  at  time  (  >  0  in  queue  iG{l,...,N\  and  let 

~  N  J 

Z,  =  (Zj1,  .  .  .  ,  Z^).  Fix  c,-  >  0  for  1  <  t  <  N  and  such  that  c,-  =  1.  For  a  given  server 

i  =i 

allocation  policy  it,  one  defines  the  average  waiting  cost  per  unit  of  time  as 


.  1  N 

J  {it)  =  lim  inf  —  Ej  c,-  Z,'  dt 


(3.1) 


A  policy  is  said  to  be  admissible  if  it  is  non-idling,  nonpre-emptive  and  nonanticipative. 
Non-idling  means  that  the  server  is  idle  only  when  the  system  is  empty.  Nonanticipative  means 
that  the  decision  to  allocate  the  server  to  queue  i  at  time  t  >  0  is  based  on  the  evolution  of  the 
network  up  to  time  t  .  Under  Assumptions  1,2  and  3  the  system  is  ergodic  under  any  non-idling 
policy  (see  Section  3.2). 

A  policy  is  optimal  if  it  minimizes  the  cost  (1.1)  over  all  the  admissible  policies.  The  prob¬ 
lem  is  to  find  an  optimal  policy. 

Outline 

We  give  a  simple  proof  and  provide  two  extensions  of  the  result  in  [7].  Specifically,  we  show 
that  a  priority  list  policy  that  serves  the  non-empty  node  with  the  highest  priority  is  optimal. 
Remarkably,  the  priorities  do  not  depend  on  the  parameters  of  the  arrival  process.  In  Section  3.2 
we  discuss  the  effects  of  idling  in  the  simple  case  of  two  nodes  with  no  feedback.  Some  auxiliary 
calculations  are  performed  in  Section  3.3  and  are  used  subsequently  in  Section  3.4  to  derive  a 
priority  index  for  each  node.  The  optimality  results  extend  to  the  case  of  pre-emptive  policies  for 
nodes  where  the  service  distributions  are  exponential. 


3.2.  The  busy  period 

Decomposition 

Convention:  The  set  of  nodes  is  partitioned  into  n  =  }  and  nc  =  {n  }. 

Assume  that  while  the  nodes  in  n  are  not  all  empty  the  server  serves  according  to  the  priority 
rule  l>2>...>n  . 

Notation:  For  a  matrix  M  and  sets  of  natural  numbers  A  and  B  ,  MAB  denotes  the  matrix 
{Mij  ,j&B  ■  Similar  notation  for  vectors  has  the  obvious  meaning. 

Definition:  Let  B be  the  time  it  takes  to  empty  nodes  n,  i.e., 

B<n>  =  inf { i  >0  |  Zn(<  )=0} 

We  represent  the  queueing  process  in  the  system  during  a  busy  period  as  a  collection  of 
trees  (see  Feller  [4]).  A  job  arriving  in  node  »  with  service  requirement  S}-  is  represented  as  a 
node  of  type  t  with  weight  Sj  .  The  children  of  each  node  are  jobs  arriving  in  the  system  while 
the  customer  is  being  served.  Each  job  initially  present  in  the  system  is  the  root  of  a  tree  and  the 
length  of  a  busy  period  is  the  sum  of  the  weights  of  these  trees.  It  is  shown  in  [13]  that  under 
Assumptions  2  and  3  one  has  E  [B(n)]<oo  for  all  n  .  It  is  easy  to  see  by  this  construction  that 


Fact:  The  random  variable  B(n)  does  not  depend  on  the  order  of  service. 


Furthermore,  one  obtains  a  decomposition  for  the  mean  of  a  busy  period.  Let  e,-  be  the  *  th 
unit  vector  in  (-oo,oo)N  and  for  any  Z(0)  write  Z(0)  =  m  je,  4-  ...+  mN  eN  .  Then, 

E  [B(n  *  |  Z(o)]  =  £  ».,-£[£?<">  |e,  ]. 

I  =1 


(3.2) 
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3.3.  Auxiliary  calculations 

Probabilities  of  transition 

In  this  section  we  calculate  the  transition  probabilities  of  a  customer  exiting  the  set  of  nodes 
n,  i.e.,  we  are  interested  in  the  quantity 

rj" Indicator  function  of  the  event  that  node  j  £ne  is  the  first  node  not  in  n  that  a  job  visits 
starting  in  node  i .  The  probabilities  are  given  by 

p>>  =  £[r>»],  (3.3) 


Lemma  3.1:  The  probabilities  defined  in  (3.3)  above  are  given,  in  matrix  form  by 


P  (» )  _  P 

■*  .1  T,'  -  r  . 


P  c  (I-P, 

nc  nv  1 


»r'p. 


(3.4) 


Proof:  For  «  gn4  and  /  €n  we  write  the  following  first  step  equations. 

Pi}n)  =  Pij  +  Ep«(Piib)  (3.5) 

te  n 

Pljn)  =  E  P|i'  P(,(V  +  pb  (3-6) 

i'  €n 

The  result  is  obtained  by  writing  the  above  equations  in  matrix  form  and  solving  (3.5)  for  P  (ni  . 

'  '  nn 


□ 


Expected  sojourn  times 

We  now  calculate  the  expected  sojourn  time  for  each  passage  of  a  customer  through  the  set 
of  nodes  n.  For  this  define 

Sy(n):  Total  amount  of  service  that  a  job  receives  until  it  exits  n  having  started  at  node  j  Gnc  . 

We  set 

T/"  *  =  E  (S/n  >]  (3.7) 

Lemma  3.2:  The  expected  sojourn  times  defined  in  (3.7)  are  given  by 

T<V  =Pn‘„(/-Fn„r10n  (3-8) 

Proof:  For  j  (Ene  and  t  €n  first  step  equation  give 

T/n)  =  E?/(  T<(n) 

l  Gn 

n1’1  -  ft  +  s  vt'” 

l'  en 

The  proof  then  proceeds  as  in  Lemma  3.1  above. 


□ 

We  next  turn  our  attention  to  a  quantity  that  will  be  important  in  the  computation  of 
priority  indices  in  Section  3.5.  Define 

/?;(B):Total  amount  of  time  needed  to  clear  the  set  of  nodes  n  of  the  arrivals  resulting  from  serv¬ 
ing  a  job  in  node  j  €ne  and  through  its  sojourn  in  n. 

Lemma  3.3:  The  expectation  of  /?/”*  is  given  by 

E  \R/n >]  =  X(d>  +  T/“ >)  S  Pi  E  [B{n)  |  e,  ] 

i  =i 


(3.9) 
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Proof:  By  A  (t)  denote  the  number  of  arrivals  in  the  interval  [0,f  ].  The  result  then  follows 
easily  from  relationship  (3.2)  and  by  noting  that 

=  E[E  [B(n)  I  A(5y+5y(n))]] 


=  S£(A,(5y+5/">)]£;[B(»)|e,-] 

«  =1 

^X(/?,+T/»>>£p,£[£(B)|e.] 

f =i 


The  last  step  follows  from  the  fact  that  the  arrival  process  is  Poisson. 


□ 


3.4.  Optimality 

The  nonpre-emptive  case 

In  this  section  we  prove  that,  as  mentioned  in  the  Section  3.1,  the  policy  that  minimizes  the 
cost  defined  in  (3.1)  is  a  priority  list  which  we  determine.  It  is  clear  that  an  optimal  policy  also 
minimizes  the  expected  cost  in  each  busy  period  of  the  system  given  by 

B 

J(n,B)  =  E[fc-Z(t)dt] 

o 

We  give  expressions  for  the  priority  index  of  each  node. 

First,  the  nodes  (l,2 . N  }  are  renumbered  as  follows.  Assign  number  1  to  the  node  that 

maximizes  the  quantity 

c<'  ck 

- j - -  ‘=1 . N.  (3.10) 

Recursively,  for  l<n  <N ,  assign  the  number  n+1  to  the  node  i£nc  that  maximizes  the  quan¬ 
tity 

«<“  E  P.*(”)c* 

k  €nc 

/jT+tV"3  ’ 

where  n  is  the  set  of  nodes  (l,..,n  }  in  the  new  numbering.  Denote  by  tt  the  priority  assignment 
list  that  corresponds  to  this  ordering. 

Theorem  3.1:  Policy  tt  is  optimal  among  all  nonpre-emptive,  non-idling  policies. 

Proof:  Let  J  (it  tt,  B  )(Z)  be  the  cost  incurred  in  a  busy  period  starting  from  state  Z  and 

following  policy  tt  in  the  first  step  and  n  thereafter.  Then  it  suffices  to  prove  that 

J(n  it,  B)( Z)  >  J(tt,  B)( Z),  for  all  Z€{o,l....}N  .  (3.11) 

It  suffices  to  consider  the  case  where  tt  (Z)  =  i  ^  ?r(Z)  =  j  with  t  >  j  .  This  implies  that 
Z=(0,...,0 ,Z}  ,  .  .  .  ,  Z'  ,* . *)  with  Z'  Z 7  >0.  For  simplicity  consider  that  Z0  =  Z.  To  estab¬ 

lish  (3.11)  define  p  to  be  the  first  time  that  policy  tt  tt  serves  node  j .  By  e  (respectively  £)  denote 
the  job  that  was  served  in  node  «  at  time  0  (respectively  in  node  j  at  time  p).  In  the  context  of 
Section  3.2,  p  is  the  time  it  takes  to  clear  the  system  of  the  descendants  of  job  e  that  have  prior¬ 
ity  higher  than  j  .  Let  /  be  the  node  in  (j-l)'  where  job  e  ends  up  after  its  sojourn  in  j-1  and 

let  x  be  the  vector  of  the  the  rest  of  the  descendants  of  e  after  their  sojourn  in  j-l.  Then  define 

p+<7  to  be  the  time  it  takes  to  serve  job  f  in  node  j  and  clear  the  system  of  the  descendants  of  $■ 
that  have  priority  higher  than  j .  Similarly,  let  £  end  up  in  k  €0-1)'  after  its  sojourn  in  j-l  and 
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let  y  be  the  vector  of  the  rest  of  the  descendants  of  f  after  their  sojourn  in  j— l.  For  each  sample 
path  that  is  obtained  by  applying  policy  n  n  construct  a  sample  path  where  jr  is  followed  until 
time  a,  then  any  policy  Tf  such  that  n(Z(a))  =  i  is  followed  for  one  step,  and  n  is  resumed  after¬ 
wards.  Denote  this  policy  by  ir^nir.  The  arrival  and  service  processes  of  jobs  with  priority  higher 
than  j  are  interchanged  as  in  the  construction  of  Section  3.2,  i.e.,  the  descendants  of  job  e  with 
priority  higher  than  j  and  the  descendants  of  job  f  with  priority  higher  than  j  are  the  same  in 
both  realizations.  The  arrival  and  service  processes  of  jobs  with  priority  lower  than  j  are  the 
same  in  both  realizations.  One  then  obtains  (see  Figure  3.1), 

J (r  n,  B  )(Z)-/(7r(<,)jf7r,  B)( Z)  =  (3.12) 

=  Cj  (/?,•  +  T^'^+E  [i?,  (j  '1)D+(  E  pnu~l)ci  +c-E  [xJK^y  +Ty(/'1)+£'  [R/,"1)]) 

I  6(j-Dc 

-{*.•  Uli  +  TiU-"+E  !i?/,-1)])+(  E  p£~l)*k  +C-E  [ym  +  T^-^+E  [/?,■<' -»])}. 

*eu-nc 

To  simplify  this  expression  we  need  to  determine  E  [xj  and  E  [yj  as  functions  of  the  system 
parameters.  For  this,  denote  by  ajf  the  expected  number  of  jobs  that  enter  node  m  €j-l  dur¬ 
ing  a  busy  cycle  that  starts  with  a  job  in  node  t  6j-l.  From  Section  3.2  one  gets 

c  E  [x]  =  E  E  Pjr0  E 

« '  e(i— dc  m£J-1  <€j-i 

From  this,  relation  (3.9)  and  some  rearrangement  one  gets  that  J (n  n,  B  )-J (n^nn,  B  )  >  0  if 

cj-E  c,  -E 

k _  >  k _ 

fij+Tj1*-1'  ~  /3i  +  Tih'-l) 

To  complete  the  proof  one  now  argues  that 

J(7r  7T,  B  )  >  J (irMm  ktc,  B )  — ►  J(ir,B)  (3.13) 

m  — »oo 

by  bounded  convergence.  Policy  xir  is  defined  recursively  as  7r(<,)7r(<,)(m-1)7r7r  for  m=l,2 . 

3uppose  now  that  policy  n  idles  for  some  amount  of  time,  at  state  Z.  The  above  argument 
then  shows  that  If  can  be  improved  and  thus  policy  ir  is  optimal  among  idling  policies  as  well. 


□ 

Remark  3.2:  The  argument  used  in  the  above  proof  is  a  variation  of  an  argument  in 
Varaiya  et  al  [14].  We  have  followed  closely  the  notation  in  Weiss  [16],  where  a  similar  argument 
appears.  Our  argument  also  gives  a  simple  proof  of  the  results  in  Foss  [5]  who  considers  a  gen¬ 
eralized  version  of  Klimov’s  problem  and  obtains  a  corresponding  index  rule. 


The  pre-emptive  case 

Assume  now  that  the  service  time  distributions  are  exponential  at  all  the  nodes  and  consider 
the  coupling  described  above  where  n  is  now  a  pre-emptive  policy  following  the  same  priority 
assignment  list  as  in  the  nonpre-emptlve  case.  One  then  sees  that  J (ri  7T,  B  )-J B)  is  the 
same  as  in  (3.12).  It  follows  that  7r  is  optimal  among  pre-emptive  policies. 

Remark  3.3:  While  this  paper  was  under  review,  the  paper  of  Lai  and  Ying  [9]  appeared. 
There,  asymptotics  of  the  "open  bandit  problem"  are  studied  as  the  discount  factor  approaches  1. 
The  above  results  are  then  derived.  Our  approach  is  simpler  in  that  it  does  not  rely  on  previous 
results  on  multi-armed  bandits.  At  the  same  time,  the  results  in  [9]  can  be  simply  obtained  by  a 
direct  argument  similar  to  ours  (see  [8]  and  [16]). 
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4.  A  communication  link  model 

In  this  section  we  demonstrate  how  interchange  arguments  can  be  employed  in  problems 
that  do  not  fall  in  the  multi-armed  bandit  framework. 

The  model 

We  consider  the  following  model  of  a  communication  link.  There  are  N  channels  to  be 
used  for  the  transmission  of  telephone  calls.  The  calls  arrive  according  to  a  deterministic 
sequence  {a*  }*  .  Each  call  is  to  be  placed  on  one  of  the  idle  communication  links,  if  one  is  avail¬ 
able,  and  is  lost  otherwise.  A  call  placed  on  link  *  is  immediately  lost  with  probability  p{  and 
with  probability  1-p,-  it  occupies  the  link  for  a  period  of  time  which  is  exponentially  distributed 
with  parameter  p,- .  The  system  is  described  by  the  vector  ZefO.l}"  where  Z' —  1  if  a  call  is 
present  at  link  «  and  Z'  =0  otherwise.  A  placement  policy  is  a  function 

u  :  Z  -  u(Z)€{l . N}. 

such  that  Z*,z)=0  if  Z^(l,  .  .  .  ,  l).  That  is,  in  state  Z  policy  u  will  place  the  next  arrival  on 
link  u(Z).  We  have  restricted  ourselves  to  deterministic  policies.  Our  arguments  however,  easily 
extend  to  randomized  policies. 

Outline 

First,  the  case  where  p,  =p,  i—  I . N  is  considered.  We  prove  that  from  any  state  Z  the 

time  T z  it  takes  to  reach  state  (l,...,l)£{o,l}jV..has  a  distribution  that  is  independent  of  the  pol¬ 
icy  u.  This  result  was  obtained  by  Smith  [12]  and  Anantharam  et  al  [l]  by  explicit  computation  of 
the  moment  generating  function  of  T 2. 

Next,  for  unequal  p,- ’s,  the  problem  of  stochastically  maximizing  Tz  for  any  initial  state  Z 
is  considered.  We  prove  that  the  optimal  policy  always  places  calls  on  the  free  channel  with  the 
largest  p,-  . 

4.1.  Invariance 

In  this  subsection  it  will  be  assumed  that  pj=p2=:  •  •  •  =fiN=fi. 

Theorem  4.1:  For  any  ZC-fo.l}^  ,  the  distribution  oi  T  z  does  not  depend  on  the  policy  u. 

Proof:  We  will  use  a  stochastic  variation  of  the  argument  used  in  Section  3.5.  Let  u  be  a 
priority  list  assigning  calls  in  the  order  l,...,/V.  As  in  the  proof  of  Theorem  3.x  it  will  suffice  to 
show  that 

r2(u'  u)  =  Tz(u)  (4.1) 

et 

for  any  policy  u  .  To  establish  (4.1)  it  suffices  to  consider  the  case  where  Z^4(l,  .  .  .  ,  l)  and 
u  (Z )==:  ^u(Z)=/  with  t  >  j  .  This  implies  that  Zl  =1,  /  =1  -l,  and  Z'  =Z 1  =0. 

Arguing  again  as  in  the  proof  of  Theorem  3.1,  we  will  establish  an  analogue  of  relationship 
(3.13).  To  this  end,  denote  by  a,  the  first  time  that  u  u  places  a  call  on  link  j  and  assume  for 
simplicity  that  a  j=0.  Denote  the  virtual  service  processes  of  the  links  by  and  their 

points  by  {«»  }<  Also,  for  l  =1 . N ,  set  rj=l  if  the  nth  trial  to  engage  link  /  is  a  success, 

and  0  otherwise. 

For  each  sample  path  of  (Z( )  resulting  from  u  u  we  construct  a  sample  path  of  a  process 
(Zt )  as  follows.  Consider  a  policy  that  places  a  call  on  link  j  at  time  0j,  follows  policy  u  after¬ 
wards,  and  places  a  call  on  link  t  at  time  ag,  if  a„<s\  .  Then,  and  the  paths  of  (Z, ) 

and  (Ze )  can  be  made  to  coincide  from  time  a „  onward.  This  would  be  the  obvious  argument  in 
the  case  where  ft—  0.  It  would  suffice  to  let  ,  /  =1,  •  •  ■  ,  N  ,  n  =1,2 . 

On  the  other  hand,  if  s',  <ag,  the  paths  of  (Z( )  and  ( Z,J  can  again  be  made  to  coincide 
from  time  s',  onward.  This  is  achieved  in  the  construction  of  (Z, )  by  letting  {5tJ}  be  the  virtual 

service  process  at  link  « ,  i.e.,  =  s  .  Then,  Z —  Z  },  —  0.  The  two  cases  are  illustrated  in 

*  i  *  i 

Figures  4.1(a)  and  4.1(b)  respectively. 
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Formally,  in  the  construction  of  process  (Zt )  the  arrival  process  remains  {a* }  and  policy  u 
is  followed  (recall  u(Z)  =  j)  until  r=a0As'i  with 

Si  =  S/,  l^i.j ,  t  >0, 

rn'=r„',/=l . TV,  n  =1,2,... 

S/—St',  t  <T. 

If  r=sj  =T{  then  continue  with  u.  Otherwise,  i.e.,  if  r=a„,  follow  tT(Zr)=i  and  then 
continue  with  u.  Denote  this  composite  policy  by  u.  In  either  case  set 

Si  =  Si,  S,’  =  Si,  t  >r. 

We  have  thus  obtained  that 

T z(u'  u)  =  T z(u(T>u). 

it 

The  proof  now  concludes  as  in  Theorem  3.1. 


□ 

Remark  4.1:  The  model  considered  here  is  similar  to  the  well  known  repairman  model  (see 
e.g.  Nash  and  Weber  [10] ) .  Our  methods  should  apply  to  that  model  as  well.  In  particular,  one 
should  be  able  to  simply  obtain  the  results  in  Hirayama  [6]  where  a  related  optimization  problem 
is  studied. 

4.2.  Optimality 

In  this  subsection  we  assume  that  4  4  4  and  that  policy  u  assigns  calls 

according  to  the  priorities  1....TV. 

Theorem  4.2:  For  any  Zelo.lj^ ,  T z  is  stochastically  maximized  by  policy  u. 

Proof:  Again,  as  in  Theorem  3.1,  we  will  show  that 

T  z(u'  u)<Tz(u) 

it 

where  u  (Z)=i^u(Z)=/  with  i>j .  With  notation  as  in  the  proof  of  Theorem  4.1  one  can 
check  that  using  a  similar  construction  for  a  process  (Ze  )  we  have 

T z(u'  u)  <  T z(u*rHi),a.s.,  (4.2) 

on  some  probability  space,  where  the  stopping  time  r  remains  to  be  specified. 

The  construction  of  (Z( )  only  differs  from  the  one  in  the  proof  of  Theorem  4.1  in  that 
S 1  [0,Tj  is  a  superset  of  S'  [0,r] .  This  can  be  done  since  Hj  >fii  ■  Also,  note  that  in  this  case  it 
suffices  to  define  r  as 

r=inf{<  j  Z<  >Z,  } 

where  the  inequality  is  component-wise. 


□ 
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